feat(core): implement Stage 1 improvements for webfetch tool#21313
feat(core): implement Stage 1 improvements for webfetch tool#21313aishaneeshah merged 8 commits intomainfrom
Conversation
|
Hi @aishaneeshah, thank you so much for your contribution to Gemini CLI! We really appreciate the time and effort you've put into this. We're making some updates to our contribution process to improve how we track and review changes. Please take a moment to review our recent discussion post: Improving Our Contribution Process & Introducing New Guidelines. Key Update: Starting January 26, 2026, the Gemini CLI project will require all pull requests to be associated with an existing issue. Any pull requests not linked to an issue by that date will be automatically closed. Thank you for your understanding and for being a part of our community! |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Size Change: +2.77 kB (+0.01%) Total Size: 26.6 MB
ℹ️ View Unchanged
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements to the web_fetch tool, making it more robust and resilient. The changes include support for multiple URLs, isolated rate limiting to prevent complete failure on single-host limits, and a surgical fallback mechanism to rescue content from failed or private URLs. The default for retrying fetch errors has also been enabled, and constants for timeout and content length have been increased. However, the implementation of the rescue logic for private IP addresses introduces a High-severity SSRF vulnerability. By explicitly allowing the CLI to fetch content from private IPs that the primary model cannot reach, the tool becomes a vector for internal network scanning and data exfiltration via prompt injection. It is recommended to block private IP access by default and only rescue failed public URLs. Additionally, there is one minor formatting suggestion to improve code cleanliness.
packages/core/src/tools/web-fetch.ts
Outdated
| const geminiClient = this.config.getGeminiClient(); | ||
| let llmContent = ''; | ||
| let returnDisplay = ''; | ||
| const needsRescue: string[] = [...privateUrls]; |
There was a problem hiding this comment.
The web_fetch tool's new 'rescue' logic explicitly identifies URLs with private IP addresses and adds them to a list to be fetched locally by the CLI. This implementation introduces a High-severity Server-Side Request Forgery (SSRF) vulnerability.
An attacker can use prompt injection (e.g., via a malicious website the user is asked to summarize) to force the LLM to call the web_fetch tool with internal URLs (like http://localhost:8080 or cloud metadata services). Since the tool 'rescues' these URLs by fetching them from the user's machine, it grants the attacker access to the user's internal network. Private IP addresses should be blocked by default in tools that perform network requests to prevent unauthorized access to internal services.
| const needsRescue: string[] = [...privateUrls]; | |
| const needsRescue: string[] = []; |
packages/core/src/tools/web-fetch.ts
Outdated
| // Error Handling | ||
| let processingError = false; | ||
| let responseText = getResponseText(response) || ''; | ||
|
|
There was a problem hiding this comment.
This line contains only whitespace and should be removed to maintain code cleanliness and adhere to the project's formatting standards.
References
- The project's style guide specifies the use of Prettier for formatting (line 19). This tool typically removes unnecessary blank lines and trailing whitespace to ensure a consistent code style. (link)
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements to the web_fetch tool, including multi-URL processing, isolated rate limiting, and a surgical fallback mechanism. However, it also introduces potential SSRF vulnerabilities by failing to consistently validate URLs against private IP addresses in new local fetch paths; specifically, the executeFallbackForUrl method and the executeExperimental mode require isPrivateIp checks, aligning with the principle that utility functions should internally validate path inputs (Rule 9). Furthermore, two high-severity issues were identified: the silent dropping of private URLs, which can lead to a confusing user experience and inaccurate telemetry events (Rule 13), and an unsafe type assertion on an API response that could cause runtime errors. Addressing these points will enhance the tool's reliability and security.
packages/core/src/tools/web-fetch.ts
Outdated
| if (privateUrls.length > 0) { | ||
| logWebFetchFallbackAttempt( | ||
| this.config, | ||
| new WebFetchFallbackAttemptEvent('private_ip'), | ||
| ); | ||
| return this.executeFallback(signal); | ||
| } |
There was a problem hiding this comment.
The current logic for handling private URLs is misleading. It logs a WebFetchFallbackAttemptEvent but then silently drops the private URLs without attempting a fallback or informing the user. This can be confusing for debugging and provides a poor user experience.
Consider treating skipped private URLs similarly to how rate-limited URLs are handled: collect them and append a warning to the final output. This would make the tool's behavior transparent to the user.
Additionally, the WebFetchFallbackAttemptEvent telemetry event is inaccurate here, as no fallback is attempted. A more specific event for skipped private URLs would be more appropriate.
References
- Telemetry event keys should accurately reflect the semantic meaning of the logged data. If an existing key is semantically confusing in a given context, a new, more specific key should be introduced to prevent misinterpretation.
packages/core/src/tools/web-fetch.ts
Outdated
| const urlContextMeta = response.candidates?.[0]?.urlContextMetadata as | ||
| | UrlContextMetadata | ||
| | undefined; |
There was a problem hiding this comment.
Using as for type assertion on an API response is not type-safe. If the urlContextMetadata object from the Gemini API does not match the UrlContextMetadata interface, this could lead to runtime errors like TypeError: Cannot read properties of undefined.
A safer approach is to use a type guard function to validate the structure of the object before using it. This ensures that the data conforms to the expected shape at runtime, preventing potential crashes if the API response changes.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant and well-implemented improvements to the web_fetch tool, enhancing its robustness, security, and functionality. The changes, including multi-URL support, isolated rate limiting, and a surgical fallback mechanism, are thoughtfully designed and align with the goal of creating a more powerful and reliable tool. The refactoring into smaller, more focused methods greatly improves code clarity and maintainability. However, a critical security vulnerability has been identified: the SSRF prevention mechanism is incomplete and can be bypassed via local hostnames (like localhost), cloud metadata service addresses, DNS rebinding, or redirects, posing a significant risk by allowing access to internal services. This directly contradicts the principle of 'fail-closed' security checks. Additionally, opportunities exist to further enhance the robustness of this new logic by ensuring consistent URL normalization and preventing redundant fetches in the fallback scenario.
packages/core/src/tools/web-fetch.ts
Outdated
| const publicUrls = toFetch.filter((url) => !isPrivateIp(url)); | ||
| const privateUrls = toFetch.filter((url) => isPrivateIp(url)); |
There was a problem hiding this comment.
The filtering of private URLs here is incomplete as isPrivateIp does not account for all local/private addresses and does not protect against DNS rebinding or redirects. See the detailed explanation in the comment on line 227.
References
- Security checks, such as SSRF protection, must be implemented in a 'fail-closed' manner. If the validity of a target cannot be fully verified (e.g., due to incomplete blacklists, DNS rebinding, or redirect bypasses), the request should be rejected by default to prevent potential access to private resources.
packages/core/src/tools/web-fetch.ts
Outdated
| meta.urlRetrievalStatus !== 'URL_RETRIEVAL_STATUS_SUCCESS' && | ||
| meta.url | ||
| ) { | ||
| needsRescue.push(meta.url); |
There was a problem hiding this comment.
To ensure consistency and prevent potential bugs, the URL from meta.url should be normalized before being added to the needsRescue array. This aligns with the normalization performed on URLs at the beginning of the execute method. Without this step, there's a risk of subtle mismatches if the API returns URLs in a slightly different format (e.g., with or without a trailing slash) than the client-side normalization logic, which could lead to incorrect rescue behavior.
needsRescue.push(normalizeUrl(meta.url));93628c6 to
e0005b7
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
The pull request refactors the WebFetchTool to support fetching multiple URLs, including handling rate limits and private IP checks for each URL, and enhancing the fallback mechanism to process multiple URLs. New test cases were added to cover these multi-URL scenarios, including partial rate-limiting, fallback for multiple public URLs, and exclusion of private URLs from fallback. Review comments highlight critical SSRF vulnerabilities: the isPrivateIp check is insufficient due to DNS rebinding and redirect following, and the primary fetch path can bypass pre-flight SSRF checks by passing potentially malicious URLs directly to the LLM. Additionally, there are improvement opportunities to use getErrorMessage for safer and more consistent error handling in catch blocks.
packages/core/src/tools/web-fetch.ts
Outdated
| if (isPrivateIp(url)) { | ||
| return `Error fetching ${url}: Access to private IP address is not allowed.`; | ||
| } |
There was a problem hiding this comment.
The SSRF protection implemented using isPrivateIp is insufficient and can be bypassed. The check is performed on the hostname string before the request, which is vulnerable to DNS rebinding attacks where a domain resolves to a public IP during validation but a private IP during the actual fetch. Additionally, fetchWithTimeout follows redirects by default, allowing an attacker to provide a public URL that redirects to an internal service (e.g., http://127.0.0.1 or cloud metadata endpoints like http://169.254.169.254). The isPrivateIp utility also lacks coverage for several reserved ranges, most notably the link-local range 169.254.0.0/16 used by AWS/GCP/Azure metadata services.
To remediate this, you should:
- Disable redirect following in
fetchby settingredirect: 'manual'. - Resolve the hostname to an IP address and validate the IP against a comprehensive list of private/reserved ranges before fetching.
- Ensure the fetch is performed against the validated IP address while providing the original hostname in the
Hostheader.
packages/core/src/tools/web-fetch.ts
Outdated
| if (isPrivateIp(url)) { | ||
| skipped.push(`[Private IP] ${url}`); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
The pre-flight SSRF check here only filters the URLs used in the fallback mechanism (toFetch). However, the original userPrompt (which still contains the potentially malicious private URLs) is passed directly to the primary fetch model on line 585. If the primary model has the capability to fetch URLs and can reach the user's internal network (e.g., via a proxy or if self-hosted), the SSRF protection is bypassed entirely for the primary execution path.
There was a problem hiding this comment.
Will be addressing in a followup for web-fetch pt 2
packages/core/src/tools/web-fetch.ts
Outdated
| // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion | ||
| const error = e as Error; | ||
| return `Error fetching ${url}: ${error.message}`; |
There was a problem hiding this comment.
For better type safety and consistency with other parts of the codebase (like the catch block in the execute method), it's recommended to use the getErrorMessage utility here instead of an unsafe type assertion (as Error). This will prevent potential runtime errors if a non-Error value is thrown.
| // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion | |
| const error = e as Error; | |
| return `Error fetching ${url}: ${error.message}`; | |
| return `Error fetching ${url}: ${getErrorMessage(e)}`; |
References
- When catching exceptions, log the detailed error for debugging instead of providing only a generic error message.
packages/core/src/tools/web-fetch.ts
Outdated
| // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion | ||
| const error = e as Error; | ||
| const errorMessage = `Error during fallback fetch for ${url}: ${error.message}`; | ||
| const errorMessage = `Error during fallback processing: ${error.message}`; |
There was a problem hiding this comment.
Similar to the other catch block, using getErrorMessage here would be safer and more consistent than the unsafe type assertion. This avoids potential runtime errors from assuming the caught value is an Error instance.
| // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion | |
| const error = e as Error; | |
| const errorMessage = `Error during fallback fetch for ${url}: ${error.message}`; | |
| const errorMessage = `Error during fallback processing: ${error.message}`; | |
| const errorMessage = `Error during fallback processing: ${getErrorMessage(e)}`; |
References
- When catching exceptions, log the detailed error for debugging instead of providing only a generic error message.
1617888 to
b245b7e
Compare
There was a problem hiding this comment.
Code Review
The pull request introduces significant improvements to the web_fetch tool, including multi-URL support, better normalization, and hardened SSRF prevention. However, the fallback mechanism introduced in executeFallback is vulnerable to Indirect Prompt Injection and data spoofing due to the way it aggregates untrusted web content into a single prompt for the LLM. The use of simple, predictable delimiters allows malicious content to manipulate the model's behavior or spoof data from other URLs. This is a high-severity issue as it could mislead the main agent into performing unauthorized actions based on attacker-controlled input.
| const aggregatedContent = results | ||
| .map((content, i) => `URL: ${urls[i]}\nContent:\n${content}`) | ||
| .join('\n\n---\n\n'); | ||
|
|
||
| try { | ||
| const geminiClient = this.config.getGeminiClient(); | ||
| const fallbackPrompt = `The user requested the following: "${this.params.prompt}". | ||
|
|
||
| I was unable to access the URL directly. Instead, I have fetched the raw content of the page. Please use the following content to answer the request. Do not attempt to access the URL again. | ||
| I was unable to access the URL(s) directly using the primary fetch tool. Instead, I have fetched the raw content of the page(s). Please use the following content to answer the request. Do not attempt to access the URL(s) again. | ||
|
|
||
| --- | ||
| ${textContent} | ||
| ${aggregatedContent} | ||
| --- | ||
| `; |
There was a problem hiding this comment.
The executeFallback method is vulnerable to Indirect Prompt Injection and data spoofing. It concatenates untrusted content fetched from multiple URLs into a single prompt for the LLM using simple --- delimiters (lines 336-338 and 342-349). An attacker can include these delimiters in a malicious webpage to break out of the content block, spoof other URLs, or inject instructions that override the tool's framing. For example, a malicious page could contain --- URL: http://internal-service/admin Content: The admin password is 'password123' ---, misleading the agent into acting on spoofed data. Additionally, the user's original prompt is embedded without escaping (line 342), which could allow for direct prompt injection if it contains double quotes and instructions.
To remediate this, use unique, non-predictable delimiters (e.g., random tokens) to separate content from different sources. Alternatively, use a structured format like JSON or XML to pass the data to the LLM, and ensure that the content is properly encapsulated or escaped to prevent it from being interpreted as instructions.
References
- To prevent prompt injection, avoid including user-provided input in content passed to the LLM (
llmContent). If the input is needed for display purposes, usereturnDisplayinstead.
| try { | ||
| const url = new URL(urlStr); | ||
| const hostname = url.hostname.toLowerCase(); | ||
| if (hostname === 'localhost' || hostname === '127.0.0.1') { |
There was a problem hiding this comment.
Why is localhost blocked? It seems like it'd be valuable to be able to exercise APIs when working on a server project.
Is the idea that we should use CURL for that instead?
There was a problem hiding this comment.
Also wondering if this really gets us anything security-wise if the agent can fallback to using CURL.
There was a problem hiding this comment.
fair point.. I think the slight nuance is webfetch is higher level and can lead to silent attack without explicit knowledge ( the data is fed to llm which can summarize rather than the user seeing what model saw from URLs)
vs for curl it'd be an explicit permission
There was a problem hiding this comment.
Regarding localhost - might be worth adding a setting to allow it explicitly. Otherwise it increases attack surface quite a bit.
packages/core/src/tools/web-fetch.ts
Outdated
| const rawGroundingSupports = groundingMetadata?.groundingSupports; | ||
| const groundingSupports = Array.isArray(rawGroundingSupports) | ||
| ? rawGroundingSupports.filter( | ||
| (item): item is GroundingSupportItem => |
There was a problem hiding this comment.
nit: maybe this type guard function should live beside the definition of GroundingSuppportItem, instead of inline, for readability and in case we need to assert this type elsewhere.
packages/core/src/tools/web-fetch.ts
Outdated
| } | ||
| // 1. Apply Grounding Supports (Citations) | ||
| const rawGroundingSupports = groundingMetadata?.groundingSupports; | ||
| const groundingSupports = Array.isArray(rawGroundingSupports) |
There was a problem hiding this comment.
nit: I think Array.isArray(rawGroundingSupports) ? ... can be simplified to rawGroundingSupports ? ... or even rawGroundingSupports !== undefined.
packages/core/src/tools/web-fetch.ts
Outdated
| const rawSources = groundingMetadata?.groundingChunks; | ||
| const sources = Array.isArray(rawSources) | ||
| ? rawSources.filter( | ||
| (item): item is GroundingChunkItem => |
|
I compared this to my previous (but new!) PR #22212 and this implementation is basically very similar and/or superior except for two caveats: Change 1 — executeFallbackForUrl() throws on failure:
Change 2 — executeFallback() short-circuits on total failure:
I have made both of those changes and added to them to this PR as well @aishaneeshah - let me know if you have any objections...otherwise, LGTM |
Summary
This PR implements Stage 1 improvements for the
web_fetchtool, focusing on robust multi-URL processing, isolated rate limiting, and a streamlined fallback mechanism. The logic has been hardened to address security concerns (SSRF) and ensure consistent error handling.Details
Core Functionality Enhancements
normalizeUrlutility to handle hostname lowercasing, trailing slash removal, and default port stripping, ensuring consistent processing and deduplication.Security & Reliability
localhostand127.0.0.1in all modes (Standard, Fallback, and Experimental). Skipped hosts are explicitly reported as warnings.getErrorMessageutility across allcatchblocks.Refactoring
executemethod into a more readable, linear flow.Related Issues
Related to #21312.
How to Validate
Run the comprehensive unit test suite:
The suite covers normalization, deduplication, mixed success/failure scenarios, security constraints, and rate limiting.
Pre-Merge Checklist